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^ ■ Abstract 

. This work is a simple extension of [T . We apply the concepts of information geom- 

■ etry to study the mean-field approximation for a general class of quantum statistical 
, models namely the higher-order quantum Boltzmann machines (QBMs). The states 

we consider are assumed to have at most third-order interactions with deterministic 
coupling coefficients. Such states, taken together, can be shown to form a quantum 
^ I exponential family and thus can be viewed as a smooth manifold. In our work, we 

explicitly obtain naive mean-field equations for the third-order classical and quantum 

■ Boltzmann machines and demonstrate how some information geometrical concepts, 

particularly, exponential and mixture projections used to study the naive mean-field 

approximation in [1 can be extended to a more general case. Though our results do 

not differ much from those in [1], we emphasize the validity and the importance of 

information geometrical point of view for higher dimensional classical and quantum 

. , statistical models. 

> 

. turn relative entropy, quantum exponential family 

in 

^ ■ 1 Introduction 



^3 
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The mean-field approximation uses a simple tractable family of density operators to cal- 
culate quantities related to a complex density operator including mutual interactions. 
Information geometry, on the other hand, studies intrinsic geometrical structure existing 
in the manifold of density operators [2]. Many authors have used mean-field approxima- 
tion to classical statistical models like classical Boltzmann machines (CBMs) [3] and also 
have discussed the properties in the in the information geometrical point of view [U|5]. In 
this work, we apply mean-field theory to the third-order CBMs and QBMs and derive the 
naive mean-field equations using information geometrical concepts. 

2 Information geometry of mean-field approximation for third- 
order CBMs 

Let us consider a network of n elements numbered as 1,2, ... ,n. Let the value of each 
element i € {1, 2, . . . , n} be Xj G {—1, +!}• Then a state of the network can be represented 
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as a; = (xi, X2, • • • , x„) G {— 1,+1}". Each element i G {1,2, carries a threshold 

value 9i G M. The network also has a real-valued parameter Wij for each pair of elements 
which is called the coupling coefficient between i and j. These parameters are 
assumed to satisfy conditions Wij = Wji, wu = 0. The other real- valued parameter is Vijk 
and is symmetric on all pairs of indices. The equilibrium (stationary) distribution is given 
by the probability distributions of the form 

p{x,h,w,v) = exp^'^hiXi + '^WijXiXj + ^ VijkXiXjXk - ip{h,w,v)^ (1) 

i i<j i<j<k 

with 

ip{h,w,v) =log'^ex.p^'^hiXi + '^WijXiXj + ^ VijkXiXjXk^, (2) 

X i i<j i<j<k 

where x = (xi, . . . ,Xn) G {— Thus, noting that the correspondence Ph,w,v ^ 
{h, w, v) is one to one, we can, at least mathematically, identify each third-order CBM [6] 
with its equilibrium probability distribution. 

Many good properties of such networks are consequences of the fact that the equilib- 
rium distributions form an exponential family. Here, we discuss this important aspect of 
the CBM [7J briefly. Let A" be a finite set or, more generally, a measurable space with an 
underlying measure dfi. We denote the set of positive probability distributions (probabil- 
ity mass functions for a finite X and probability density functions for a general {X,dfi)) 
on ^ by = V{X). When a family of distributions, say 

M = {pe\e = {e% i = l,...,n}ciV, (3) 

is represented in the form 

n 

pe(x) = exp{c(x) + J^e7^(x)-^(^)}, xeX, (4) 

i=l 

Ad is called an exponential family. Here, 9'^; i = 1, . . . ,n are R-valued parameters, c and 
fi are functions on X and V'(^) is a real-valued convex function. Further, we assume that 
the correspondence 9 pg is one to one. These 9 = (9^) are called the natural coordinates 
ofM. 

Now, for the exponential family A4, if we let 

r],{9)=Ke[f^] = Y,Pei^)f^{^) 

X 

then r] = {r]i) and 9 = (9^) are in one-to-one correspondence. That is, we can also use rj 
instead of 9 to specify an element of AA. These (r/j) are called the expectation coordinates 
of A4. The expectation coordinates are, in general, represented as 

v. = dMe) (^^ = ^)- (5) 

The set that consists of equilibrium probability distributions of third-order CBM ([T|) 
is one example of exponential family. In addition, threshold values, coupling coefficients 
(weights) and third-oder weights become the natural coordinates while IEe[xi], Ee[xjXj] 
and Eg[xiXjXk] become expectation coordinates. The notion of exponential family is very 
important in statistics and information geometry, and is also useful in studying properties 
of third-order CBMs with their mean-field approximations. 
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We now consider a hierarchy of exponential famihes. Let Vr be the set of rth-order 
CBMs. Then Vr also turns out to be an exponential family. Thus, we have a hierarchical 
structure of exponential families Vi C V2 C ■ ■ ■ C Vr- In particular, Vi,V2 and V3 can be 
represented by 

T^i = {Pii^i) ■ ■ 'Pn{xn)} = {product distributions}, 
V2 = {equilibrium distributions of CBM} (6) 
and 

T^3 = {equilibrium distributions of third-order CBM} (7) 

respectively. 

In this subsection, we derive the naive mean-field equation for third-order CBM. When 
the system size is large, the partition function exp{ip{h,w,v)) is very difficult to calculate 
and thus explicit calculation of the expectations is intractable. Therefore, due to 
that difficulty, we are led to obtain a good approximation of nii for a given probability 
distribution Ph,w,v £ "^3- 

First, we consider the subspace Vi of V3. We parametrize each distribution in Vi by 
h and write as 

Phix) = exp ^"^hiXi - ijj(h)Y (8) 

i 

where 

V'(^) = X^log|exp(/ij) + exp(-/ii)|. 

i 

Then, Vi forms a submanifold of V3 specified by Wij = = Vij^ and hi as its coordinates. 
The expectations fhi := lE/^fxj] form another coordinate system of Vi. For a given pj^ &Vi, 
it is easy to obtain fhi = E,j^[xi] from hi because Xj's are independent. We can calculate 
TTij to be 

ohi ex.p{hi) + exp(-/ij) 

from which we obtain 

-h. = lloJl±^]. (10) 



2 \1 — rrii 

The simple idea behind the mean field approximation for a Ph,w,v £ ^3 is to use quantities 
obtained in the form of expectation with respect to some relevant pj^ £ Vi. 

Now, we need a suitable criterion to measure the approximation of two probability 
distributions q € V and po G Ai. For the present purpose, we adopt the Kullback-Leibler 
(KL) divergence (relative entropy) 

D{q\\pe) = Y.^{x)log^ (11) 

. Given Ph,w,v £ ^'3, its e- (exponential) and m-(mixture) projections (see [2]) onto Vi are 
defined by 

P^"^ = Ph(-) '= a'cgmm D{p-^\\pe) (12) 
= =^ argmin D{pg\\p-^) (13) 



and 



respectively, where 



h^^^ = argminDlpfi^Wpe) (14) 

h={hi) 
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and 

= argminD(p5)||p/^). 

As necessary conditions, we have 

d 



and 



_d_ 

dhi 



D{pe\\p-h) = ^, 



(15) 



(16) 



(17) 



which are weaker than ()14p and (jlSp . But sometimes ()16p and ()17p are chosen to be the 
definitions of e-, m- projections respectively for convenience. It can be shown that the 
m-projection p^™) gives the true values of expectations, that is = fhi or E(g)[xi] = 
E;j[3;j] for = The e-projection p'^^^ from onto Vi gives the naive mean-field 

approximation for third-order CBM. Now we derive the naive mean-field equation for 
third-order CBM following [4J. Recall that the equilibrium distribution for third-order 
CBM is given by 

p^= p{x,h,w,v) = expj^ /ijXj + ^ WijXjXj + ^ -UjjfcXiXjXfc - V'(/i, w,'v)|(18) 



i<j<k 



with 



ilj{h,w,v) =log'^exp^'^hiXi + '^WijXiXj + ^ VijkXiXjXk^, (19) 

X i i<j i<j<k 

where x = (xi, . . . ,Xn) S {—1, +1}". Now we define another function 
(l){p) = <j){h, w, v) =^ ^ Vijk Ep[xiXjXk\ + ^ Wij Ep[xiXj] + ^ /ij ^p[x-i\ - ip{p), (20) 

i<j<k i<j i 

which coincides with the negative entropy 

<P{.P) = ^P{x) logp(a;). (21) 



In particular, for a product distribution pj^ £ Vi, using IE;j[2;j] = nii , we have 



l+nii 



log 



1 + mi 



+ 



1 - rrii 



log 



1 — m,; 



(22) 



The KL divergence between p and pj^ € Vi can be expressed in the following form: 
D{Ph\\p) = V'(p) + <t){Ph) - E '"^i^ ^p[xiXjXk] - E '^ij ^p[xiXj] - "^hi ^h[xi] 

i<j<k i<j i 

= V'(p) + HPh) - E '^ijkmrhjfhk - E Wijfhifhj -'^h 

i<j<k i<j 



irrii. 



;;i + mi)log 



1 + rrii 



+ (1 - nii) loe 



1 - rrii 



i<j<k i<j i 



rrii. 



(23) 
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Now consider the e-projection (fT6|) from p gV^ onto pJ^ G Vi, i.e. 

-^D{p-^\\p) = 0. (24) 
ohi 

Noting that h and fh are in one-to-one correspondence, we may consider instead 

^ -D{pj,\\p) = 0. (25) 



drrii 

Since tpip) does not depend on rfij, we obtain from ([23]) that 

= ^ log ( I ^ "^M - ^ Vijkfhjfhk -^Wijfhj - hi 

= - Vijkfhjfhk -'^WijUij - hi, (26) 

where the second equahty is from (jlOp. Thus the naive mean- field equation is obtained 
from dSD and ([26]) as 

tanh~"'^(mj) = VijkfhjTfif^ + ''^^Wijfhj + hi (27) 

and this is usually written in the form 

fhi = tanhl Vijkfhjfhf^ + WijUij + /ij | . (28) 

3 Definition of third-order quantum Boltzmann machines 

Let us consider an n-element system of quantum spin-half particles. Each element is 
represented as a quantum spin with local Hilbert space C^, and the n-element system 
corresponds to % = (C^)*^" ~ C^". Let S be the set of strictly positive states on Ti; 

5 = {p[p = p* > and Trp = 1}. (29) 

Here, each p is a 2" x 2" matrix; p = p* > means that p is Hermitian and positive 
definite respectively; and Tr p = 1 shows that the trace of the density matrix p is unity. 
Now an element of S is said to have at most rth-order interactions if it is written as 

pe = exp{j^4^V.. + ^j;^g,a..a,, + ...+ €..rs,...sr''nsr ■ ■ ■ 'Ti^s. - m} 

i,s i<j s.t ii<---<v si-.-Sr 

r 

j=l ii<---<ij si...Sj 

with 

r 

V^(0) = logTrexp{j; J] €..,s,...s,^^^s, ■ ■ ■ a.,s,} , (31) 

j=l ii<---<ij si...Sj 

where ais = (g) cj^ (g) /®('^-»), = (olll i^^^^^^). Here, / is the identity matrix on Ti 

and as for s € {1, 2, 3} are the usual Pauli matrices given by 

01\ f -i \ f 1 



^1 = > 1 ' = U ' V -1 
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Letting Sr be the totality of states pg of the above form, we have the hierarchy Si C S2 C 
S3 C ■ ■ ■ C Sn = S. Note that Si is the set of product states pi ® p2 ® • • • ® Pn- 

Corresponding to the classical case, an element of 1S2 is called a QBM (see [Ij). The 
third-order quantum Boltzmann machines are given by the elements of ^3 and those states 
can be explicitly written as 

Ph,w,v — 

-ij{h,w,v)^ (32) 

i,s i<j s,t i<cj<ck 3,t,u 

with 

ip{h,w,v) = logTrexp^y^hisais + y^^y^WijstcrisCrjt + ^ y^^VjjkstuCrjsCrjtcrjt^ , (33) 

i,s i<j s.t i<j<ks,t,u 

where h = {his),w = (wijst) and v = {vijkstu)- 



4 Some information geometrical concepts for quantum sys- 
tems 

We discuss in this section some information geometrical concepts for quantum systems [2] . 
Let us consider a manifold S of density operators and a submanifold M of S. We define 
a quantum divergence function from p € 5 to cr € 5, which in this case turns out to be 
the quantum relative entropy and its reverse represented by 

D^-^\p\\a) ""^Tvlpilogp-loga)]; D^+^\p\\a) ''^^^ TV[a(log a - log p)]. (34) 

The quantum relative entropy satisfies ^'•^^•'(pllcj) > 0, D{p\\a) = iff p = a but 
it is not symmetric. 

Given p G S, the point r^^^^ G is called the e, m-projection of p to when 
function D^^^\p\\t), t M takes a critical value at t^^^\ that is 

^Z)(±i)(p||r(O) = (35) 

at r^^^) where ^ is a coordinate system of A4. the minimizer of D^^^\p\\t), t G 7W, is 
the ibl-projection of p to Al. 

Next we introduce a quantum version of exponential family in the following. Sup- 
pose that a parametric family 

M = {pe\e = {9'); i = l,...,m} cS (36) 

is represented in the form 

m 

p, = exp{c + ^0*Fi-V(0)}, (37) 
1=1 

where -Fj (i = 1, . . . , m), C are Hermitian operators and il^{0) is a real-valued function. We 
assume in addition that the operators {Fi, . . . ,Fm,I}, where / is the identity operator, 
are linearly independent to ensure that the parametrization 9 pg is one to one. Then 
A4 forms an m-dimensional smooth manifold with a coordinate system 9 = (0*). In this 
thesis, we call such an a quantum exponential family or QEF for short, with natural 
coordinates 9 = (6**). Note also that for any 1 < k < n the set Sk of states ([30|) forms a 
QEF, including Si of product states, ^2 of QBMs and ^3 of third-order QBMs. 
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If we let 

r]i{e) = TT[peF,], (38) 

then r] = {r]i) and 9 = (0*) are in one-to-one correspondence. That is, we can also use rj 
instead of 6 to specify an element of 7W. These (rji) are called the expectation coordinates 
oiM. 

In particular, the natural coordinates of S3 are given by {h,w^v) = {hisjWijstjVijkstu) 
in (j32p ■ while the expectation coordinates are (jn, jj,, l) = (jnisj fJ'ijstT ^ijkstu 

) defined by 

rriis = Tilph^yj^^ ais] and fiijst = Tr[ph,w,v aisajt]and iijkstu = Tr[/3/,,u,,^, (TisajtcJku]- 

(39) 

On the other hand, the natural coordinates of Si are h = (his) in ()43p . while the expecta- 
tion coordinates are fh = (fhis) defined by 

This = Trliy^ais]. (40) 

In this case, the correspondence between the two coordinate systems can explicitly be 
represented as 

= = ^^tanh(||/ii||) (41) 

ohis \\hi\\ 

or as 

his = T^tanh-\\\fn,\\), (42) 
WrriiW 



where ||mi|| =^ ^/^Jj^is)^- 

5 Information geometry of mean-field approximation for third- 
order QBMs 

5.1 The submanifold of product states and its geometry 

In this section, we briefly discuss the set Si. The elements of Si are represented as by 
letting w = and f = in ()32p . In the sequel, we write them as 



n, 



exp I ^ hisffis -ip{h)^ (43) 



by using new symbols r and h = (his) when we wish to make it clear that we are treating 
Si instead of ^3 . We have 



i=l 



exp I ^/iis<Ts - V'i(^i)}, (44) 



where hi = {his)s and 

tpi(hi) = logTrexp I ^ /li^o-sj 

s 

= log{exp(||/i,||)+exp(-||/ii||)} (45) 



with \\hi\\ =^ x/J^si^i^y ■ Note that 



^(;i) = ^^,(/i,). (46) 
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5.2 The exponential & mixture projections and mean-field approxima- 
tion 



In this section, we derive the naive mean-field equation for third-order QBMs exphcitly 
from the viewpoint of information geometry. Suppose that we are interested in calculating 
the expectations rriis = Tr[/3/i_^Xjs] from given {h,w) = (his^Wijst)- Since the direct 
calculation is intractable in general when the system size is large, we need to employ 
a computationally efficient approximation method. Mean-field approximation is a well- 
known technique for this purpose. The simple idea behind the mean-field approximation 
for a ph,w,v G is to use quantities obtained in the form of expectation with respect to 
some relevant rj^ G iSi. T. Tanaka [1] has elucidated the essence of the naive mean-field 
approximation for classical spin models in terms of e-, m-projections. Our aim is to extend 
this idea to quantized spin models other than that considered in yy. 

In the following arguments, we regard S3 as a QEF with the natural coordinates 
(6*") = {his,Wijst, Vijkstu) and the expectation coordinates (t/q,) = {niis, fiijst, kjkstu), where 
a is an index denoting a = {i, s) or a = (i, j, s, t) 01 a = (i, j, k, s, t, u). We follow a slightly 
different method to that of classical setting to obtain naive mean-field equation for third- 
order QBM. 

Recall that the state for third-order QBM ()32p is given by 

Ph,w,v '^ijkstu^is t (^j t 

-'il){h,w,v)^ (47) 

i,s i<j s,t i<j<ks,t,u 

with 

= logTrexp|^/iis<Tis + ^^-Wij-st-TisO-ji ^ ^ VjjkstuCrjsCrjtakt^ , (48) 

i,s i<j s,t i<j<k s,t,u 

where h = {his),w = {wijst) and v = {vijkstu)- 

Given ph,w,v G S^, its e-(-|-l) and m-(-l) projections (see [2]) onto Si are defined by 

T^^^^ = T"ft(±i) *= argmini:'(/3,i,^,^,|lT^). (49) 

We denote by rh[f^\ph^w,v] the expectation of (Tig with respect to f^^^\ that is Tr[-f ^^^Vj^]- 
Then mj^^^ [Ph,w,v] is given by 

d 



dmjS 



D^^^\ph,u,A\^h) = ^- (50) 



From the information geometrical point of view, f'^^^') g Si is the ±l-geodesic projection 
of Ph,w,v to Si in the sense that the ±l-geodesic connecting ph,w,v and f is orthogonal to 
5i at r = f(^^). but we know that f^~^^ is m-projection of ph,w,v to Si. then we have 

"^L '^'^ '^is[Ph,w,v\ which is the quantity we want to obtain. This relation can be directly 
calculated by solving 

^ D'--^\ph,n,A\^h) = ^, (51) 



drriiS 

because this is equivalent to 



d _ d _ 

Tr[/'/i,«,,t,(log Ph,w,v 



omiS arriiS 

Hence fh\^ = Tr[p/j „,^„(Tis] which is the quantity we have been searching for. But we 
cannot calculate '~^[ph,w,v(yis] explicitly due to the difficulty in calculating ip{h, w, v) for 

Ph,w,v 
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If we use the e-projection of p E ^3 to <Si instead of the m-projection, we have the 
naive mean-field approximation as in the classical case, to show this we calculate the 
e-projection (1-projection) of p to Si. Then we have 

Z)«(p||r) = D(-i)(r||p) =TY[r(logr-logp)] (53) 

i,s 

-i'^hisais +'^'^WijstcrisCrjt+ ^ VijkstuCrjsCrjtcrjt - ipjh, w, v) [M) 

i,s i<j s,t i<j<k s,t,u 

= ^ hisffiis - tpih) - ^ hisfhis + X] X] Wijstrhisfhjt 

i,s i,s i<j s,t 

+ ^ Vijkstumsfhjtrhku - ipjh, w, v), (55) 

i<j<k s,t,u 

where we define fhis = Tr[Tais],fhisfhjt = Tr[Tais(Tjt] and rhisrhjtfhku = Ti[Taisajtaku]- 
Hence 



d 



dnijS 



This gives 



d 
dfhjS 



j^i s,t k^j^ks,t,u 
l^ij kstuf^jt^ku — 

0. (57) 

j=^i s,t k=^j=^ks,t,u 



Vijkstul^jt'lTT'ku 

j^i s,t kj^jj^k s,t,u 



(58) 



and rriis is given by 



dipiihi) his . , .m7 |u 
mis = — — = ,,- ,, tanh([|nj,|[j. 



dhis 

Both ([58|) and ([59]) together give the naive mean-field equations for third-order QBMs. 



(59) 



6 Concluding remarks 

We have applied information geometry to the mean-field approximation for a general class 
of quantum statistical models. Here, we were able to derive only the naive mean-field 
equations. However, it is known that the naive mean-field approximation does not give a 
good approximation to the true value. Therefore, to improve the approximation we need 
to consider the higher order approximations and the information geometrical point of view 
is left open. 
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